19 research outputs found
FLAIR: Federated Learning Annotated Image Repository
Cross-device federated learning is an emerging machine learning (ML) paradigm
where a large population of devices collectively train an ML model while the
data remains on the devices. This research field has a unique set of practical
challenges, and to systematically make advances, new datasets curated to be
compatible with this paradigm are needed. Existing federated learning
benchmarks in the image domain do not accurately capture the scale and
heterogeneity of many real-world use cases. We introduce FLAIR, a challenging
large-scale annotated image dataset for multi-label classification suitable for
federated learning. FLAIR has 429,078 images from 51,414 Flickr users and
captures many of the intricacies typically encountered in federated learning,
such as heterogeneous user data and a long-tailed label distribution. We
implement multiple baselines in different learning setups for different tasks
on this dataset. We believe FLAIR can serve as a challenging benchmark for
advancing the state-of-the art in federated learning. Dataset access and the
code for the benchmark are available at
\url{https://github.com/apple/ml-flair}
Population Expansion for Training Language Models with Private Federated Learning
Federated learning (FL) combined with differential privacy (DP) offers
machine learning (ML) training with distributed devices and with a formal
privacy guarantee. With a large population of devices, FL with DP produces a
performant model in a timely manner. However, for applications with a smaller
population, not only does the model utility degrade as the DP noise is
inversely proportional to population, but also the training latency increases
since waiting for enough clients to become available from a smaller pool is
slower. In this work, we thus propose expanding the population based on domain
adaptation techniques to speed up the training and improves the final model
quality when training with small populations. We empirically demonstrate that
our techniques can improve the utility by 13% to 30% on real-world language
modeling datasets
Cell-phone traces reveal infection-associated behavioral change
To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked DownloadEpidemic preparedness depends on our ability to predict the trajectory of an epidemic and the human behavior that drives spread in the event of an outbreak. Changes to behavior during an outbreak limit the reliability of syndromic surveillance using large-scale data sources, such as online social media or search behavior, which could otherwise supplement healthcare-based outbreak-prediction methods. Here, we measure behavior change reflected in mobile-phone call-detail records (CDRs), a source of passively collected real-time behavioral information, using an anonymously linked dataset of cell-phone users and their date of influenza-like illness diagnosis during the 2009 H1N1v pandemic. We demonstrate that mobile-phone use during illness differs measurably from routine behavior: Diagnosed individuals exhibit less movement than normal (1.1 to 1.4 fewer unique tower locations; [Formula: see text]), on average, in the 2 to 4 d around diagnosis and place fewer calls (2.3 to 3.3 fewer calls; [Formula: see text]) while spending longer on the phone (41- to 66-s average increase; [Formula: see text]) than usual on the day following diagnosis. The results suggest that anonymously linked CDRs and health data may be sufficiently granular to augment epidemic surveillance efforts and that infectious disease-modeling efforts lacking explicit behavior-change mechanisms need to be revisited.
Keywords: call detail records; disease; influenza; outbreak; surveillance.Alan Turing Institute Engineering and Physical Sciences Research Council EP/N510129/1
UK Research & Innovation (UKRI)
Medical Research Council UK (MRC)
European Commission
National Institute for Health Research (NIHR) Health Protection Research Unit in Evaluation of Interventions at the University of Brist
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices
Federated Learning (FL) is a technique to train models using data distributed
across devices. Differential Privacy (DP) provides a formal privacy guarantee
for sensitive data. Our goal is to train a large neural network language model
(NNLM) on compute-constrained devices while preserving privacy using FL and DP.
However, the DP-noise introduced to the model increases as the model size
grows, which often prevents convergence. We propose Partial Embedding Updates
(PEU), a novel technique to decrease noise by decreasing payload size.
Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive
Estimation (NCE) to reduce the memory demands of large models on
compute-constrained devices. This combination of techniques makes it possible
to train large-vocabulary language models while preserving accuracy and
privacy